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1  Introduction  and  Objectives 

Research  on  an  evolutionary  approach  to  designing  neural  networks  that  learn  was  begun  at 
SRI  International  (SRI)  in  July  1989  under  AFOSR  sponsorship  (SRI  Project  7929,  Contract 
No.  F49620-89-K0005).  This  report  describes  the  research  conducted  during  the  first  year  of 
the  project. 

The  aim  of  this  program  is  to  design  a  system  that  can  learn  to  recognize  signals  adap¬ 
tively.  That  is,  the  system  should  learn  to  respond  in  a  distinctive,  repeatable  way  to  those 
signals  to  which  it  has  been  exposed,  should  track  changes  to  its  signal  environment  (includ¬ 
ing  possibly  the  introduction  of  entirely  new  classes  of  signals),  and  should  do  these  things 
spontaneously,  with  no  instruction.  Adaptative  signal  recognition  should  be  the  result  of 
a  self- reorganization  of  the  system  in  the  face  of  a  changing  environment.  Our  hypothesis 
is  that  the  principles  of  biological  evolution  and  population  genetics  provide  the  basis  for 
such  behavior.  The  processes  of  variation,  selection,  and  differential  reproduction  are  known 
to  produce  in  natural  populations  the  kind  of  emergent  behavior  we  seek  to  emulate.  By 
simulating  these  processes  on  the  computer,  we  hope  to  observe  similar  kinds  of  behavior  in 
artificial  systems. 

Clearly,  this  approach  requires  powerful  computing  resources.  We  must  simulate  statis¬ 
tically  significant  populations  of  networks  on  many  inputs  over  many  generations.  While 
such  an  approach  may  be  impractical  on  conventional  serial  computer  systems,  it  has  a  high 
degree  of  implicit  parallelism  that  can  be  exploited  with  suitable  hardware.  Not  only  can 
we  simulate  all  members  of  the  population  in  parallel,  but  the  classes  of  phenotypes  that  we 
study  have  an  internal  parallelism  that  we  can  exploit.  The  feedforward  perceptrons  with 
hidden  units,  for  example,  have  a  parallel  dataflow  structure.  SRI’s  Connection  Machine 
provides  a  nearly  ideal  computational  engine  for  our  work. 
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Our  purpose  is  not  to  model  biological  processes  explicitly,  but  rather  to  explore  a  genetic 
and  ecological  metaphor  of  computation.  We  are  interested  in  investigating  this  metaphor 
for  two  reasons.  First  of  all,  adaptive  behavior  may  lead  to  very  general  methods  of  dealing 
with  difficult  and  ill-defined  problems  in  signal  understanding.  A  system  that  can  learn  from 
experience  without  explicit  training  by  examples,  that  can  exploit  contextual  information, 
and  that  can  modify  itself  to  adapt  to  possibly  radical  changes  in  its  input  could  be  useful  for 
difficult  problems  such  as  speaker-independent  speech  recognition.  In  addition,  the  inherent 
parallelism  of  the  evolutionary  metaphor,  with  its  emphasis  on  populations,  can  lead  to 
effective  methods  for  exploiting  the  power  of  parallel  computer  systems. 

2  Status  of  the  Research 

2.1  Introduction 

The  problem  we  are  addressing  is  a  simplified  version  of  the  class  of  problems  we  will  tackle 
eventually.  We  describe  the  general  problem,  and  then  describe  the  simplified  version  we 
have  investigated  this  year. 

Suppose  that  we  have  a  system,  for  the  time  being  regarded  as  a  “black  box,”  that  receives 
as  input  a  signal  vector  of  length  n,  x  =  (xo, . . .  £n-i  )•  These  signals  could  be,  for  example, 
speech  waveforms.  The  components  of  x  are  real  numbers  within  some  limited  dynamic 
range.  In  practice,  since  any  measurement  of  a  real  signal  will  be  uncertain  to  some  degree, 
we  can  represent  the  signal  vector  with  nonnegative  integers  to  some  precision  6  bits.  Each 
possible  signal  is  a  point  in  the  n-dimensional  metric  signal  space. 

Now  suppose  the  system  is  stimulated  only  by  a  much  smaller,  structured  ensemble  of 
signals  generated  by  a  few  unknown,  relatively  low-dimensional  physical  processes,  possibly 
corrupted  by  noise.  They  are  called  sources.  They  could  be,  for  example,  a  few  speakers  of 
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English.  There  may  be  considerable  variation  within  a  single  source,  so  we  should  imagine 
a  source  to  be  represented  by  a  subset  of  the  signal  space:  its  attractor.  The  task  of  the 
system  is  to  respond  distinctively  to  each  source.  From  looking  at  a  macroscopic  feature  of 
the  system,  we  should  be  able  to  tell  when  it  has  been  presented  with  a  source  and  which 
source  it  is. 

In  the  simplified  problem  we  restrict  the  components  of  the  input  vector  to  binary  values 
( b  =  1)  and  restrict  the  sources  to  single  values  (point  attractors).  Under  these  assumptions, 
the  system  will  be  learning  a  subset  of  the  numbers  {0, ...,2n  —  1}.  The  signal  vector  can 
be  visualized  as  the  corners  of  an  n-dimensional  hypercube,  and  the  response  of  the  system 
will  be  to  select  one  of  these  corners. 

2.2  Encoder  Populations 

Each  subsystem  is  an  instantiation  of  a  simple  neural  network  called  an  encoder  [1,11]  as 
shown  in  Figure  1.  An  ni-n2-n 3  encoder  has  n\  inputs  that  feed  into  n2  hidden  units,  which 
in  turn  feed  into  713  output  units.  Each  unit  computes  a  weighted  sum  of  the  inputs  and 
compares  the  result  with  a  threshold.  If  the  sum  exceeds  the  threshold,  the  unit  is  activated 
and  outputs  a  one;  otherwise,  it  produces  a  zero. 

Originally,  these  networks  were  used  to  attack  the  encoding  problem  [11].  Assume  that 
tii  =  TI3  and  n2  =  log2ni,  and  that  the  inputs  consist  of  a  single  one  bit,  with  all  the  rest 
zeros.  The  position  of  this  bit  then  represents  one  of  the  first  n  natural  numbers.  The 
encoding  problem  is  to  learn  to  encode  these  numbers  into  a  pattern  of  log  n  bits,  and  also 
to  learn  to  decode  this  logn  bits  pattern  into  an  output  pattern,  usually  identical  to  the 
input  pattern.  We,  however,  are  using  the  population  of  encoders  in  quite  a  different  way. 
Instead  of  finding  a  single  network  that  solves  the  encoding  problem  for  all  sources,  we  want 
to  construct  subpopulations  of  networks  that  are  specialized  for  encoding  different  sources. 
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Figure  1:  A  4-2-4  encoder. 
In  general,  an  encoder  is  a  tuple 


t  =  [M,V,v] 

where  /3  —  (/30, . . . ,  0n3-l)  and  7  =  {70 » •  •  • , 7n3-i)  are  thresholds  for  the  hidden  units  and 
the  output  units,  respectively,  and  U  =  {u,j|0  <  t  <  «2,0  <  j  <  «i}  and  V  =  {v,j|0  <  i  < 
n 2,0  <  j  <  713}  are  weight  matrices. 

An  encoder  accepts  an  ni-bit  input  vector  a,  produces  an  n2-bit  hidden  vector  b,  and 
then  produces  an  ri3-bit  output  vector  c.  Each  unit  applies  a  threshold  function 


0(0,  s) 


1  if  s  >  <f> 

< 

0  otherwise 


to  the  sum  of  its  weighted  inputs: 


bi  =  ©(ft,  u«j ai ) 

0<;'<ni 
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Cj  =  0(7,i.  v06i)  • 

0<i<n3 

It  is  essential  to  the  genetic  algorithm  described  below  that  a  description  of  an  encoder 
may  be  decomposed  into  parts,  called  genes,  in  such  a  way  that  a  new  encoder  (a  child  can 
be  constructed  with  parts  from  two  others  (the  parents)  [7,5],  In  part,  we  have  chosen  the 
encoder  network  for  this  work  because  it  can  be  decomposed  in  a  fairly  natural  way.  The 
genetic  structure  of  an  encoder  is  illustrated  in  Figure  1.  Each  encoder  has  n2  hidden-unit 
genes  and  «3  output-unit  genes.  The  hidden-unit  genes  are  the  more  complex  of  the  two 
types.  The  ith  hidden-unit  gene  of  an  encoder  f  consists  of  the  hidden-unit  threshold  /?,,  a 
vector  of  input  weights  (u,y|0  <  j  <  nx),  and  a  vector  of  hidden-unit  weights  (v,j|0  <  j  <  n3). 
The  jth  output-unit  gene  consists  simply  of  the  output-unit  threshold  7 j. 

The  system  consists  of  a  population  of  N  encoders 

H  =  {{*,0  <  k  <  N} 

with,  in  general,  different  thresholds  and  weights.  We  always  have  nx  =  n3  and  typically,  but 
not  necessarily,  n2  =  log2  nx.  Every  encoder  in  the  population  is  presented  simultaneously 
with  the  same  input  vector,  and  tries  to  reconstruct  the  input.  Success  is  measured  by  a 
fitness  function  [3,8] 

/*(*)  =  “  ]C  la*  “  c«l  • 

0<j<T» 

Note  that  fitness  is  simply  the  negative  of  the  Hamming  distance  between  the  input  and 
the  output  vectors.  The  idea  behind  the  genetic  algorithm  described  below  is  to  increase 
the  frequency  of  genes  and  combinations  of  genes  in  E  by  selection,  thereby  causing  the 
population  to  learn  to  encode  the  inputs  it  sees  most  frequently. 
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2.3  A  Genetic  Algorithm 


Genetic  algorithms  can  be  effective  for  exploring  large  design  spaces  [5,7].  The  essential  idea 
is  to  simulate  many  generations  of  populations  of  individual  subsystems,  with  each  generation 
produced  from  previous  generations  by  selection  and  differential  reproduction  [3,4,6,10].  Each 
individual  is  graded  by  a  fitness  function  that  is  intended  to  measure  its  performance  on  one 
or  more  instances  of  a  problem.  Those  individuals  that  are  most  fit  are  selected  and  then 
a  set  of  new  subsystems  is  created  by  applying  genetic  operators  to  the  descriptions  of  the 
selected  individuals.  Commonly  used  genetic  operators  are  called  crossover  and  mutation , 
modeled  after  similar  processes  that  drive  biological  evolution  [2,5,7],  Although  the  concepts 
behind  genetic  algorithms  are  very  general,  there  are  inevitably  a  wide  variety  of  parameters, 
reproduction  schemes,  representations,  and  so  on  that  could  be  used.  Part  of  the  aim  of  this 
preliminary  work  is  to  understand  the  consequences  of  and  interactions  among  these  choices. 

Our  genetic  algorithm  consists  of  an  initialization, 

followed  by  an  iteration  of  the  generation  operator,  Q: 

=  <-£(-, a*),  <  =  0,1,... 

In  the  initialization  step,  a  population  of  at  least  N  =  40961  encoders  with  n  inputs  and 
m  hidden  units  is  created.  All  thresholds  and  weights  are  chosen  from  a  uniform  random 
distribution  over  the  interval  [—1. 1).  Initially,  all  of  the  members  of  E  are  marked  as  alive  and 
are  assigned  an  age  chosen  from  a  random  distribution  of  integers  in  the  range  [0, . . . ,  agemax- 

1].  Only  those  encoders  marked  as  alive,  denoted  by  S„,  are  active  and  available  for  input, 
1  We  use  a  Connection  Machine  with  4096  processors  for  onr  simulations.  N  can  be  larger  than  4096,  but 
must  be  a  power  of  2. 
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selection,  and  reproduction.  All  encoders  that  are  not  alive  are  treated  as  available  space 
for  the  next  generation.  The  age  of  (  is  an  integer  indicating  the  number  of  generations  for 
which  (  has  been  continuously  alive. 


The  generation  function  Q  is  defined  as  the  following  sequence  of  steps: 

ft  <—  se/ect(/,E,  a) 

ft*  <-  reproduce (ft) 

E  <—  tnserf(ft*,E) 

E  —  age{E) 

E  -  kill(E) 

These  steps  can  be  performed  in  several  ways,  but  each  step  has  the  basic  characteristics 
outlined  below,  in  Section  2.4. 

Selection:  ft  <—  select(f,E,  a) 

An  input  bit  vector,  a,  is  chosen  and  presented  to  the  system.  The  input  can  be  selected  in  a 
variety  of  ways.  The  simplest  is  to  select  the  vector  from  a  set  of  sources  according  to  some 
prior  probability  distribution.  Input  vectors  can  be  degraded  with  noise  by  inverting  bits 
with  some  probability.  Inputs  can  also  be  chosen  randomly  from  the  set  of  2”  possible  inputs 
with  some  specified  frequency.  All  living  encoders  are  ranked  by  fitness  and  a  subset  ft  of 
the  most  fit  is  selected.  The  size  of  ft  could  be  determined  dynamically  by  a  threshold  on 
fitness.  Instead,  in  this  preliminary  investigation,  we  set  the  size  of  ft  as  a  fixed  proportion 
of  the  size  of  E  (usually  1/16). 

Reproduction:  ft*  «—  reproduce (ft) 

Every  member  of  ft  is  paired  at  random  with  another  member  of  ft  (possibly  itself),  which 
is  called  its  mate.  The  pairs  are  combined  to  produce  a  fixed  number  of  children.  The 
combination  is  performed  by  applying  two  genetic  operators,  crossover  and  mutation.  In 
the  crossover  operation,  every  child’s  gene  is  selected  from  one  or  the  other  parent  with 
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probability  1/2,  a  process  called  free  recombination  [6,9].  In  the  mutation  operation,  every 
gene  constituent,  whether  a  weight  or  a  threshold,  is  replaced  by  a  random  value  with  some 
probability  of  mutation  n,  which  is  usually  quite  low. 

Insertion:  E  «—  insert(fi*,E) 

A  random  number  k  €  {0, . . .,  Ar  —  1}  i6  generated  for  every  child  in  Cl".  If  &  is  not  alive,  the 
child  is  inserted  into  E  at  that  location,  is  marked  as  alive,  and  is  assigned  an  age  of  zero.  If 
more  than  one  child  tries  to  occupy  the  same  location,  one  child  is  chosen  at  random. 

Aging:  E  <-  age(E) 

The  ages  of  all  living  encoders  are  increased  by  1. 

Death:  E  *—  kill(E) 

Every  encoder  whose  age  is  greater  than  agemax  is  marked  as  not  alive.  Its  space  in  E  then 
becomes  available  for  the  children  in  the  next  generation. 

2.4  Results 

When  interpreting  the  performance  of  the  system,  we  consider  only  those  encoders  that  can 
reconstruct  their  outputs  perfectly.  These  are  said  to  respond  to  the  input;  that  is,  rjt(a)  =  1, 
where 

r*(a)  =  max(0, 1  +  /*(a))  . 

We  want  many  networks  to  respond  to  the  sources,  few  or  none  to  respond  to  nonsource 
signals,  and  different  subpopulations  to  respond  to  each  different  source. 

Two  measures  of  the  effectiveness  of  the  system  depend  on  computing  the  probability 
distribution  />(a|r),  which  is  the  probability  that  the  signal  is  a  given  that  a  randomly 
chosen  encoder  is  responding.  This  distribution  is  computed  assuming  no  prior  knowledge  of 
the  frequency  of  occurrence  of  the  source.  Therefore,  using  a  uniform  (maximum  entropy) 


9 


distribution  of  priors 


P(a)  =  ^ 
v  '  2" 

and  writing  the  probability  of  an  encoder  responding  to  a  as 

£*  r*(a) 


Aria)  = 


N 


and  the  probability  of  an  encoder  responding  to  any  signal  as 

£x£fcrfc(x) 


P(r)  = 


N2n 


we  use  Bayes’s  Rule  to  determine  the  desired  distribution: 

P(r|a)P(a) 


P(a|r)  = 


P(r)  ’ 


or 


P(a|r)  = 


*£*rfc(a) 


£x  rt(x) 

Ideally,  this  distribution  should  be  identical  to  the  prior  probability  P(a)  after  many  gener¬ 
ations. 

We  can  compute  the  entropy  of  P(a|r) 


S  =  -^P(xlr)log2P(x|r) 
x 

to  summarize  the  degree  of  organization  of  the  system  in  terms  of  the  uncertainty  associated 
with  its  response.  We  can  also  compute  the  correlation  between  P(a|r)  and  some  prior  model 
distribution  Pm(&)  from  which  the  sources  were  chosen: 

c  Ex(A*|r)  -  PNQ)(Pm(x)  -  p^QO) 

vtx(  Ax|r)  -  -  P^))2"  ' 

The  first  three  experiments  described  below  use  entropy  and  correlation  to  examine  the 
evolution  of  the  system  under  different  conditions.  Because  the  time  required  to  compute 
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Figure  2:  Typical  behavior  (no  mutation) 

P(a|r)  grows  exponentially  with  the  length  of  the  input  vector,  n,  these  experiments  were 
done  only  on  small  4-2-4  encoders.  The  fourth  experiment  examines  the  behavior  of  the 
system  when  n  is  larger  and,  in  particular,  when  the  number  of  possible  inputs  greatly  exceed 
the  size  of  the  population.  Finally,  the  fifth  experiment  examines  whether  the  population 
becomes  specialized  to  the  sources. 

2.4.1  Experiment  1:  Typical  Behavior  (no  mutation) 

The  first  experiment  examines  the  typical  behavior  of  a  population  of  16K  4-2-4  encoders 
with  no  mutation  (/z  =  0).  The  inputs  were  chosen  at  random  with  equal  frequency  from  a 
set  of  four  sources.  Figure  2  shows  the  entropy  of  P(a|r)  over  1000  generations  when  the 
maximum  number  of  children  nc  is  2  and  4  ((a)  and  (b),  respectively).  Also  shown  is  the  size 
of  the  population  that  is  living. 

In  both  cases  the  entropy  eventually  drops  to  the  ideal  value  of  log2  4  =  2,  which  is  the 
entropy  of  the  model  distribution.  The  correlation  with  the  model  distribution  (not  shown) 
is  very  nearly  1  after  only  about  20  generation.  The  fraction  of  the  population  that  is  living 
fluctuates  at  first,  but  eventually  approaches  some  limit,  which  is  greater  for  the  nc  =  4  case. 
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Figure  3:  Changing  environment 
2.4.2  Experiment  2:  Changing  Environment 


The  previous  simple  experiment  illustrated  that  adaptation  can  occur  without  mutation, 
relying  only  on  the  crossover  operation.  This  experiment  shows  that  mutation  is  essential 
in  a  more  challenging  problem.  Figure  3  shows  the  entropy  and  the  correlation  measures 
when  the  system  is  successively  stimulated  with  two  different  sets  of  four  signals,  L\  and  L2. 
Two  cases  are  shown:  p  =  0  and  p  =  0.01.  The  interesting  feature  of  this  experiment  is 
that  in  the  first  case,  p  =  0,  the  system  “collapses”  into  an  irreversible  condition  of  total 
insensitivity  on  the  third  presentation  of  the  set  L\ .  The  entropy  drops  to  zero,  indicating 
that  the  system  can  respond  to  no  signals  (or  possibly  to  only  one),  and  the  correlation  with 
the  model  distribution  drops  effectively  to  zero.  Apparently,  the  successive  presentations 
and  epochs  of  selection  have  eliminated  variation  in  5.  Selection  for  L\  eliminates  genes 
effective  for  L2,  selection  for  L2  eliminates  genes  effective  for  Xj,  and  so  on,  until  by  the  third 
presentation  of  Xj,  E  has  been  so  depleted  that  it  cannot  adapt. 


0  1000  0  1000  0  1000  0  1000 

generations 


Figure  4:  Effects  of  noise 

In  the  case  of  [i  —  0.01  this  does  not  happen.  Even  this  low  rate  of  mutation  is  sufficient 
to  maintain  adequate  variation  in  E.  The  crossover  operation  is  effective  for  making  large 
jumps  though  the  space  of  genotypes,  while  mutation  is  effective  as  a  continual  source  of 
variation. 

2.4.3  Experiment  3:  Effects  of  Noise 

Experiment  3  examines  the  effects  of  noise  in  the  input.  The  population  size  is  4K,  the 
encoders  are  4-2-4,  four  different  sources  are  used  with  equal  probability,  fi  =  0.01,  nc  =  4, 
and  agemax  —  30.  Each  encoder  is  presented  with  an  input  vector,  selected  from  the  four 
sources,  but  each  vector  has  a  probability  Pn  of  having  (at  least)  one  bit  changed  at  random. 
All  encoders  receive  input  from  the  same  source,  but  the  inputs  are  corrupted  by  noise 
independently,  so  that  any  two  encoders  may  see  different  signals.  Figure  4  shows  four  cases: 
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Pn  =  0.1,0.2,0.25,0.4.  Entropy  is  shown  above  and  correlation  below.  The  shaded  portions 
of  the  correlation  graphs  indicate  when  the  system  is  working,  in  the  sense  that  the  four 
signals  of  highest  probability  are  identical  to  the  sources.  The  system  performs  well  up  to 
Pn  =  0.2  but  degrades  quickly  for  higher  noise  levels. 

2.4.4  Experiment  4:  Large  n 

To  test  the  system  on  a  larger  problem,  and  in  particular  on  a  problem  in  which  the  number  of 
possible  signals  greatly  exceeds  the  size  of  E,  we  performed  a  simulation  with  16-4-16  encoders 
and  eight  sources.  As  in  the  previous  simulation  the  population  size  is  4K,  //  =  0.01,  nc  =  4, 
and  agemax  =  30.  Because  the  number  of  possible  inputs  is  216  =  64  A'  it  is  not  practical  to 
compute  the  complete  distribution  P(a|r),  especially  not  for  every  generation.  Instead,  we  let 
the  system  run  for  4,000  generations  and  then  counted  the  number  of  encoders  that  responded 
averaged  over  all  eight  sources,  which  was  488.5,  and  the  average  number  of  encoders  that 
responded  averaged  over  1000  randomly  chosen  signals,  which  was  0.13. 


2.4.5  Experiment  5:  Specialization 


The  last  experiment  examines  whether  the  population  divides  into  disjoint  subpopulations 
specialized  for  the  sources.  Suppose  we  have  s  sources  with  i2,  being  the  subpopulation  of 
encoders  that  respond  to  source  i.  The  following  equation  gives  a  normalized  measure  of  the 
overlap  between  two  subpopulations: 


_  1#«  n  Rj\ 

'3  Ifl.ufM 


0  <  i,j  <  s  . 


Ideally,  On  should  be  one  if  i  =  j  and  zero  otherwize  for  complete  specialization.  Figure  5 
shows  matrices  of  overlap  measures  for  four  cases.  When  we  adapt  4-2-4  encoders  to  only 
two  sources,  shown  in  Figure  5  (a),  no  specialization  occurs  at  all:  nearly  every  encoder 
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Figure  5:  Specialization 


that  responds  to  one  source  also  responds  to  the  other.  When  we  adapt  the  same  system 
to  four  sources  (b)  or  seven  sources  (c),  there  is  some  specialization,  with  relatively  more 
specialization  occurring  when  there  are  more  sources.  Finally,  when  we  adapt  a  system  of 
16-2-16  encoders  to  ten  sources,  Figure  5  (d),  the  specialization  is  nearly  perfect,  with  only 
two  subpopulations  having  a  significant  degree  of  overlap. 

2.5  Summary  of  Accomplishments 

For  the  encoding  problem,  the  evolutionary  algorithm  exhibits  effective  adaptation.  DifFef^ 
ential  reproduction  amplifies  the  frequency  of  selected  genes  and  leads  to  the  emergence  of  a 
population  that  is  progressively  more  fit.  In  our  model,  free  recombination  (crossover)  seems 
to  be  the  primary  means  of  adaptation.  Two  relatively  fit  parents  clearly  have  a  better-than- 
average  chance  of  producing  more  fit  offspring.  Mutation,  on  the  other  hand,  has  only  an 
average  chance  of  producing  an  offspring  that  is  more  fit,  regardless  of  the  parents’  fitness. 
However,  by  itself  free  recombination  causes  a  progressive  loss  of  information:  those  genes 
that  are  amplified  replace  others  that  are  lost  forever.  This  loss  of  diversity  in  the  gene 
pool  is  disastrous  if  the  ensemble  of  sources  changes,  as  demonstrated  in  Experiment  2.  The 
mutation  operator  continuously  injects  diversity  into  the  gene  pool,  thereby  preventing  the 
system  from  becoming  trapped  in  a  low-diversity  dead  end. 

Our  approach  differs  from  some  genetic-algorithm  and  neural-network  approaches  in  a 
fundamental  way.  We  do  not  seek  an  individual  encoder  that  is  “most  fit”  overall;  instead, 
we  seek  subpopulations  of  networks  that  have  specialized  their  responses  to  particular  sources. 
The  response  of  the  system  is  an  aggregate,  macroscopic  feature  of  the  individual  responses 
of  a  large  population  of  individual,  interacting  subsystems.  We  view  fitness  as  a  very  general 
concept:  simply  a  measure  of  the  similarity  between  the  input  and  the  output.  Rather  than 
being  built  in  to  the  fitness  function,  the  evolutionary  trend  toward  specialization  is  instead 
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an  emergent  property  of  the  population  as  a  whole,  and  a  consequence  to  the  informational 
bottleneck  in  the  encoders.  Unlike  the  more  standard  optimization  methods  for  designing 
systems,  this  method  results  in  subpopulations  that  resemble  species  adapted  to  different 
ecological  niches  that  are  determined  by  the  sources. 

We  would  like  to  simulate  populations  with  more  diverse  features,  such  as  variable  sizes, 
reproduction  rates,  age  limits,  and  mutation  rates.  Currently,  these  properties  are  global 
to  all  encoders,  but  they  could  be  variable,  inherited  properties,  represented  as  “modifier 
genes”  attached  to  the  basic  encoder  genotype.  We  speculate  that  this  process  will  lead  to 
more  interesting  adaptation  because  it  will  create  more  niches  for  adaptation  to  fill.  For 
example,  one  can  imagine  relatively  large,  scarce,  long-lived  encoders  specializing  on  complex 
sources  that  appear  infrequently  or  change  slowly  or  relatively  small,  numerous,  short-lived, 
and  perhaps  highly  mutable  encoders  specializing  on  common,  simple  sources.  A  further 
interesting  possibility  is  the  coevolution  of  interacting  populations  in  symbiotic  or  parasitic 
relationships  [10]. 

We  are  changing  the  input  representation  of  the  more  general  case  of  6-bit  samples  so 
that  we  can  investigate  applications  to  real,  physical  sources.  Whether  the  approach  can  be 
extended  to  more  complex  sources  than  point  attractors  is  an  open  question.  To  do  so,  the 
basic  encoder  representation  may  have  to  be  extended  to  a  more  elaborate,  dynamic  network. 
Instead  of  an  encoder,  we  may  need  a  generator  whose  internal  state  allows  it  to  recognize 
and  mimic  (i.e.,  predict)  a  sours  with  a  low  number  of  dimensions. 

3  Publications  and  Presentations 

A  paper  by  Stephen  T.  Barnard  and  Aviv  Bergman  will  be  published  in  the  Proceedings  of 
Parallel  Problem  Solving  from  Nature ,  a  workshop  held  in  Germany,  on  October  1990.  Aviv 
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Bergman  also  participated  in  the  international  workshop  on  Evolution  and  Complex  System, 
in  Torino,  Italy,  on  July  1990.  This  workshop  included  fruitful  discussion  among  several  of 
the  world’s  top  researchers  into  complex  systems  and  evolution. 
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